Improving Grapheme Codebook Selection for Scribe Identification

نویسندگان

  • Tara Gilliam
  • Richard C. Wilson
  • John A. Clark
چکیده

In this paper we test several approaches to analysing grapheme codebook features for offline writer identification in medieval English scribal manuscripts. Current methods for selecting a codebook typically produce codebooks that perform no better than random grapheme selection, so our aim in this analysis is to identify potential methods of improving codebook selection. Three feature extraction methods are tested, and a number of feature selection methods are proposed and compared. Results show that PCA-based selection and a broad range of grapheme similarities perform best, while reducing computation time by a factor of four. All methods are compared on a modern dataset and a medieval dataset with very different characteristics; the results are robust to data variation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A fast search method of speaker identification for large population using pre-selection and hierarchical matching

Performance of search during matching phase in a speaker identification system realized through vector quantization (VQ) is investigated in this paper. Voice of each person is recorded in a office room with personal computers. LPC−cepstrum is selected as feature vector. In order to gain higher success rate of identification, it is necessary to use larger size codebook for each person. Consequen...

متن کامل

Improving automatic writer identification

State-of-the-art systems for automatic writer identification from handwritten text are based on two approaches: a statistical approach or a model-based approach. Both approaches have limitations. The main limitation of the statistical approach is that it relies on single-scale statistical features. The main limitation of the model-based approach is that the codebook generation is time-consuming...

متن کامل

Predicting the scribe behind a page of medieval handwriting

This paper addresses the issue of attributing pieces of medieval handwriting to scribes known from other examples of writing. The system is applied to manuscript page images and performs extraction and comparison of letter shapes. Letters and sequences of connected letters are identified by means of connected component labeling. This is followed by further splitting into letter-size pieces. The...

متن کامل

SFLA Based Gene Selection Approach for Improving Cancer Classification Accuracy

 In this paper, we propose a new gene selection algorithm based on Shuffled Frog Leaping Algorithm that is called SFLA-FS. The proposed algorithm is used for improving cancer classification accuracy. Most of the biological datasets such as cancer datasets have a large number of genes and few samples. However, most of these genes are not usable in some tasks for example in cancer classification....

متن کامل

Category-based phoneme-to-grapheme transliteration

Grapheme-based speech recognition systems are faster to develop but typically do not reach the same level of performance as phoneme-based systems. In this paper we introduce a technique for improving the performance of standard grapheme-based systems. We find that by handling a relatively small number of irregular words through phoneme-to-grapheme (P2G) transliteration – transforming the origin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011